Skip to content

MDEV-39791: Handle count aggregate optimization for replay purpose#5145

Merged
bsrikanth-mariadb merged 1 commit into
bb-12.3-MDEV-39368-test-replayfrom
13.1-MDEV-39791-handle-count-optimization-in-replay
Jun 4, 2026
Merged

MDEV-39791: Handle count aggregate optimization for replay purpose#5145
bsrikanth-mariadb merged 1 commit into
bb-12.3-MDEV-39368-test-replayfrom
13.1-MDEV-39791-handle-count-optimization-in-replay

Conversation

@bsrikanth-mariadb
Copy link
Copy Markdown
Contributor

@bsrikanth-mariadb bsrikanth-mariadb commented May 29, 2026

During replay, when get_exact_record_count() is invoked from opt_sum_query(),
hook the recorded table rows from the parsed context for each table,
during the calculation of count.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for recording and replaying count aggregate contexts (count_agg) within the optimizer context store/replay framework. While the implementation successfully integrates the new structures and serialization routines, several critical issues must be addressed. Most notably, there is a type confusion vulnerability in opt_sum.cc where a cast to Item_field* is performed without verifying the item type, which could lead to server crashes. Additionally, there are bugs involving the serialization of a pointer instead of the actual count value, incorrect string appending of a ulong counter, potential counter desynchronization during OOM events, and swapped error messages in the JSON parsing routines.

Comment thread sql/opt_sum.cc Outdated
Comment thread sql/opt_context_store_replay.cc Outdated
Comment thread sql/opt_context_store_replay.cc Outdated
Comment thread sql/opt_context_store_replay.cc Outdated
Comment thread sql/opt_context_store_replay.cc Outdated
Comment thread sql/opt_context_store_replay.cc Outdated
@bsrikanth-mariadb bsrikanth-mariadb force-pushed the 13.1-MDEV-39791-handle-count-optimization-in-replay branch 2 times, most recently from 5f4ea85 to 65823a0 Compare May 29, 2026 09:16
@spetrunia spetrunia force-pushed the bb-12.3-MDEV-39368-test-replay branch from 039fe52 to 90b9f07 Compare May 29, 2026 11:57
@bsrikanth-mariadb bsrikanth-mariadb force-pushed the 13.1-MDEV-39791-handle-count-optimization-in-replay branch 5 times, most recently from 438acc1 to fabfa22 Compare June 2, 2026 23:24
@spetrunia
Copy link
Copy Markdown
Member

MariaDB [test]> set optimizer_record_context=1;
Query OK, 0 rows affected (0.001 sec)

MariaDB [test]> create table t10 (a int not null) engine=myisam;
Query OK, 0 rows affected (0.014 sec)

MariaDB [test]> insert into t10 select seq from seq_1_to_3;
Query OK, 3 rows affected (0.004 sec)
Records: 3  Duplicates: 0  Warnings: 0
MariaDB [test]> explain select count(*) from t10;
+------+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                        |
+------+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
|    1 | SIMPLE      | NULL  | NULL | NULL          | NULL | NULL    | NULL | NULL | Select tables optimized away |
+------+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
1 row in set (0.001 sec)

^ this is not caught by capture at all.
This EXPLAIN output means that the query did use get_exact_record_count...

To check that I do have your patch, etc:

MariaDB [test]> explain select count(a) from t10;

Now, this does hits a breakpoint in Optimizer_context_recorder::record_count_agg, as expected.

@spetrunia
Copy link
Copy Markdown
Member

count_agg
record_count_agg
infuse_count_agg

This is a bad name. Please change to exact_row_count .

Comment thread sql/opt_sum.cc Outdated
Comment thread mysql-test/main/opt_context_replay_basic.result Outdated
Comment thread sql/opt_context_store_replay.cc Outdated
Comment thread sql/opt_context_store_replay.cc Outdated
Copy link
Copy Markdown
Member

@spetrunia spetrunia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the above

@bsrikanth-mariadb bsrikanth-mariadb force-pushed the 13.1-MDEV-39791-handle-count-optimization-in-replay branch from fabfa22 to 80c607d Compare June 4, 2026 03:07
@bsrikanth-mariadb
Copy link
Copy Markdown
Contributor Author

count_agg
record_count_agg
infuse_count_agg

This is a bad name. Please change to exact_row_count .

sure. changed it.

@bsrikanth-mariadb
Copy link
Copy Markdown
Contributor Author

bsrikanth-mariadb commented Jun 4, 2026

MariaDB [test]> set optimizer_record_context=1;
Query OK, 0 rows affected (0.001 sec)

MariaDB [test]> create table t10 (a int not null) engine=myisam;
Query OK, 0 rows affected (0.014 sec)

MariaDB [test]> insert into t10 select seq from seq_1_to_3;
Query OK, 3 rows affected (0.004 sec)
Records: 3  Duplicates: 0  Warnings: 0
MariaDB [test]> explain select count(*) from t10;
+------+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                        |
+------+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
|    1 | SIMPLE      | NULL  | NULL | NULL          | NULL | NULL    | NULL | NULL | Select tables optimized away |
+------+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
1 row in set (0.001 sec)

^ this is not caught by capture at all. This EXPLAIN output means that the query did use get_exact_record_count...

To check that I do have your patch, etc:

MariaDB [test]> explain select count(a) from t10;

Now, this does hits a breakpoint in Optimizer_context_recorder::record_count_agg, as expected.

Yes, count(*) would also be hitting the opt_sum_query() call, and we could get the count. However, I see one problem. How could we get the table information if the expression inside the count() is not a field_type? For count(a) I could get table info from field a.

@spetrunia
Copy link
Copy Markdown
Member

spetrunia commented Jun 4, 2026

However, I see one problem. How could we get the table information if the expression inside the count() is not a field_type? For count(a) I could get table info from field a.

The optimizer already handles all this.

create table t10 (a int, b int not null) engine=myisam;
insert into t10 select seq,seq from seq_1_to_10;
create table t11 like t10;
insert into t11 select seq, seq from seq_1_to_3;

If the expression is not NULLable, it doesn't matter what specific columns are used in it:

MariaDB [test]> explain select count(t10.b+1) from t10, t11;
+------+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                        |
+------+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
|    1 | SIMPLE      | NULL  | NULL | NULL          | NULL | NULL    | NULL | NULL | Select tables optimized away |
+------+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
1 row in set (0.001 sec)
MariaDB [test]> explain select count(t10.b+t11.b) from t10, t11;
+------+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                        |
+------+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
|    1 | SIMPLE      | NULL  | NULL | NULL          | NULL | NULL    | NULL | NULL | Select tables optimized away |
+------+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
1 row in set (0.001 sec)

NULL-able expression disqualifies the optimization:

MariaDB [test]> explain select count(t10.b+t11.a) from t10, t11;
+------+-------------+-------+------+---------------+------+---------+------+------+------------------------------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                              |
+------+-------------+-------+------+---------------+------+---------+------+------+------------------------------------+
|    1 | SIMPLE      | t11   | ALL  | NULL          | NULL | NULL    | NULL | 3    |                                    |
|    1 | SIMPLE      | t10   | ALL  | NULL          | NULL | NULL    | NULL | 10   | Using join buffer (flat, BNL join) |
+------+-------------+-------+------+---------------+------+---------+------+------+------------------------------------+
2 rows in set (0.001 sec)

But do we need to think about all these details now?

AFAIU what we need to do is to place hooks to save and/or infuse the record count in get_exact_record_count(). And let the optimizer code handle the rest as it currently does it

@bsrikanth-mariadb
Copy link
Copy Markdown
Contributor Author

However, I see one problem. How could we get the table information if the expression inside the count() is not a field_type? For count(a) I could get table info from field a.

The optimizer already handles all this.

create table t10 (a int, b int not null) engine=myisam;
insert into t10 select seq,seq from seq_1_to_10;
create table t11 like t10;
insert into t11 select seq, seq from seq_1_to_3;

If the expression is not NULLable, it doesn't matter what specific columns are used in it:

MariaDB [test]> explain select count(t10.b+1) from t10, t11;
+------+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                        |
+------+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
|    1 | SIMPLE      | NULL  | NULL | NULL          | NULL | NULL    | NULL | NULL | Select tables optimized away |
+------+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
1 row in set (0.001 sec)
MariaDB [test]> explain select count(t10.b+t11.b) from t10, t11;
+------+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                        |
+------+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
|    1 | SIMPLE      | NULL  | NULL | NULL          | NULL | NULL    | NULL | NULL | Select tables optimized away |
+------+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
1 row in set (0.001 sec)

NULL-able expression disqualifies the optimization:

MariaDB [test]> explain select count(t10.b+t11.a) from t10, t11;
+------+-------------+-------+------+---------------+------+---------+------+------+------------------------------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows | Extra                              |
+------+-------------+-------+------+---------------+------+---------+------+------+------------------------------------+
|    1 | SIMPLE      | t11   | ALL  | NULL          | NULL | NULL    | NULL | 3    |                                    |
|    1 | SIMPLE      | t10   | ALL  | NULL          | NULL | NULL    | NULL | 10   | Using join buffer (flat, BNL join) |
+------+-------------+-------+------+---------------+------+---------+------+------+------------------------------------+
2 rows in set (0.001 sec)

But do we need to think about all these details now?

AFAIU what we need to do is to place hooks to save and/or infuse the record count in get_exact_record_count(). And let the optimizer code handle the rest as it currently does it

okay. So, get_exact_record_count() wouldn't take any arguments such as TABLE*. Instead, depending on the invocation/call number we return the count. Is it?

@bsrikanth-mariadb bsrikanth-mariadb force-pushed the 13.1-MDEV-39791-handle-count-optimization-in-replay branch 3 times, most recently from 1299232 to 5c9ac44 Compare June 4, 2026 11:34
During replay, when get_exact_record_count() is invoked from opt_sum_query(),
hook the recorded table rows from the parsed context for each table,
during the calculation of count.
@bsrikanth-mariadb bsrikanth-mariadb force-pushed the 13.1-MDEV-39791-handle-count-optimization-in-replay branch from 5c9ac44 to cf18a5b Compare June 4, 2026 12:50
@bsrikanth-mariadb bsrikanth-mariadb merged commit cf18a5b into bb-12.3-MDEV-39368-test-replay Jun 4, 2026
1 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

2 participants